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ABSTRACT 

— '  OptimM^ra  algoiufitM.,^r]^ally  require  the  solution  of  many  systems  of  linear  equations 
=  kXwhe&4"8*  numbieirTrf  variables-Qt,..c^straints  are  present,  these  linear  systems 
coTud  account  Iot  much'oC^e  total  computation  time>>^ 

Both  direct  s^d  iteratiTe'Uqu^ion  solTers  are  needed  (practice.  Unfortunately,  most  of  the 
off-the  shelf  solTerV>tfe  designed  rot*  single  systems,  wherea^ptimisation  problems  giye  rise  to 
hundreds  or  thousandsof  systems.  To  ivtM  refactorisation, j>r  to  speed  the  convergence  of  an 
iterative  method,  it  is  ehq^^  to  note  tha^s  is  related  to 

review  various  spamMStrieet^^t  arise  in  optimisation,  and  discuss  compromises  that 
yo^e  currently  being  made  in  dealing  wit^'the^  Since  significant  advances  continue  to  be  made 
with  single-system  solver s,^iS^  give  special  attemten  to  methods  that  allow  such  solvers  to  be  used 
repeatedly  on  a  sequence  of  modified  systems  (e.g^^<^e  product-form  update;  use  of  the  Schur 
complement).  The  speed  of  factorising  a  matrix  then  orcomes  relatively  less  important  than  the 
efficiency  of  subsequent  mI^s  very  many  right-hand,  sides. 

At  the  same  time,  ^  we  he^e  that  hiture  improvemems  to  linear-equation  software  will  be 
oriented  more  specific^y  to  the  case  of  related  matrices  &. 
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1.  Introduction 

1.1.  Baekgroiind.  The  mi^or  opphcation  of  sparse  matrix  techniques  in  optimisation  up  to  the 
present  has  been  in  the  implementation  of  the  simplex  method  for  linear  programming  (LP)  (see, 
e.g.,  Dantsig,  1963).  In  fact,  commercial  codes  for  large  LP  problems  seem  to  hare  predated  codes 
for  sparse  linear  equations  (eren  though  solring  a  sparse  LP  problem  requires  solving  many  sparse 
linear  systems).  In  the  commercial  uforld  today,  more  sparse  matrix  computation  is  probably 
expended  on  linear  programs  than  on  any  other  type  of  problem,  and  linear  programs  involving 
thousands  of  unknowns  can  be  solved  routinely.  Because  of  the  great  success  of  the  simplex 
algorithm  and  the  wide  availability  of  LP  codes,  many  large-scale  optimisation  problems  tend 
to  be  formulated  as  purely  linear  programs.  However,  we  shall  see  that  this  limitation  is  often 
unnecessary. 

Before  considering  particular  methods,  we  emphasise  that  methods  for  large-scale  optimisa¬ 
tion  have  a  special  character  attributable  in  large  part  to  the  critical  importance  of  linear  algebraic 
procedures.  Since  dense  linear  algebraic  techniques  tend  to  become  unreasonably  expensive  as  the 
problem  dimension  increases,  it  is  usually  necessary  to  compromise  what  seems  to  be  an  “ideal” 
strategy.  (In  fact,  an  approach  that  would  not  even  be  considered  for  small  problems  may  turn 
out  to  be  the  best  choice  for  some  large  problems.)  Furthermore,  the  relative  cost  of  the  steps  of 
many  optimisation  methods  changes  when  the  problem  becomes  large.  For  example,  the  perfor¬ 
mance  of  unconstrained  optimisation  algorithms  is  often  measured  by  the  number  of  evaluations 
of  the  objective  function  required  for  convergence.  Although  simplistic,  this  is  a  reasonable  gauge 
of  effectiveness  for  most  problems  of  low  dimension  because  the  number  of  arithmetic  operations 
per  iteration  tends  to  be  small,  and  the  amount  of  work  required  for  storage  manipulation  is 
negligible.  However,  as  the  sise  of  the  problem  grows,  the  “housekeeping”  (cost  of  arithmetic  and 
data  structures)  becomes  comparable  to,  and  may  even  dominate,  the  cost  of  function  evaluations. 

Most  optimization  methods  are  iterative;  we  shall  consider  algorithms  in  which  the  (kH-  I)-th 
iterate  is  defined  as 

^k+l  =  **  +  OikPk,  (1-1) 

where  a*  is  a  non-negative  scalar,  and  the  n-vector  pk  is  called  the  search  direction.  One  of 
the  primary  applications  of  sparse  matrix  techniques  in  optimization  is  in  solving  one  or  more 
systems  of  linear  equations  to  obtain  pk. 

It  is  usual  for  thousands  of  iterations  to  be  required  to  solve  a  single  large  optimization 
problem,  and  hence  it  might  appear  that  the  computation  time  required  would  be  enormous,  even 
with  the  best  available  sparse  matrix  techniques.  Fortunately,  the  linear  systems  that  define  ps+i 
are  usually  closely  related  to  those  that  define  pn  (and  the  degree  of  closeness  can  be  controlled 
to  some  extent  by  the  choice  of  algorithm).  In  addition,  the  sequence  {x^}  will  often  converge 
to  the  solution  with  only  mild  conditions  on  {ps}-  Consequently,  there  is  a  certain  fiexibility 
in  the  definition  of  pk.  The  design  of  algorithms  for  large-scale  optimisation  problems  involves 
striking  a  balance  between  the  effort  expended  at  each  iteration  to  compute  pk  and  the  number 
of  iterations  required  for  convergence. 


1.2.  Summaiy.  The  three  main  subdivisions  of  optimisation  are  discussed  in  turn  (unconstrained, 
linearly  constrained,  and  nonlinearly  constrained).  A  common  denominator  is  the  need  to  solve 
many  qrstems  of  linear  equations,  and  the  need  to  update  various  factorisations  in  order  to 
deal  with  sequences  of  related  equations.  We  indicate  situations  where  off-the-shelf  software  can 
be  applied.  Symmetric  positive-definite  solvers  are  mainly  useful  for  unconstrained  problems, 
while  unsymmetric  solvers  are  essential  for  dealing  with  linear  constraints.  There  is  an  inevitable 
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emphuit  on  ttie  latter  became  most  large  optimiiation  problems  enrrentlj  being  solTed  infolTe 
sparse  linear  constraints. 

The  principal  updating  problem  is  that  of  replacing  one  column  of  a  square  matrix.  Hosrater, 
there  exists  on^  one  generally  available  package  for  updating  sparse  factors  in  situ.  We  therefore 
focm  on  method  that  allow  an  off-the-shelf  solrer  to  be  used  repeatedly  on  the  same  matrix  with 
different  right-hand  sides.  Such  methods  facilitate  more  general  updates  to  sparse  matrices.  In 
one  instance,  a  sparse  indefinite  solder  is  needed. 

The  final  section  on  nonliimar  constraints  covers  methods  that  solve  a  sequence  of  simpler 
subproblems,  to  which  the  preceding  comments  apply. 


f .  Unconstrained  Optimiiation 

2.1.  Methods  fbr  dense  problems.  The  unconstrained  optimisation  problem  involves  the  mini¬ 
misation  of  a  scalar-valued  objective  function,  i.e. 


minimise  F(x). 

•€n* 

We  assume  that  F  is  smooth;  let  g(x)  and  H(x)  denote  the  gradient  vector  and  Hessian  matrix 

dtF. 

Many  techniqoes  are  available  for  solving  unconstrained  problems  in  which  n  is  small  (for 
recent  surv^,  see,  e.g.,  Brodlie,  1977;  Fletcher,  1980;  Gill,  Murray  and  Wright,  1981).  The  most 
popular  methods  compute  the  search  direction  as  the  solution  of  a  system  of  linear  equations  of 
the  form 

HhPk  —  -—Skp  (2»1) 

where  §k  i*  tlM  gradient  of  F  at  Xk,  and  Hk  is  a  suitable  symmetric  matrix  that  is  most  often 
intend^  to  represent  (in  some  sense)  H(xk).  H  Hk  is  positive  definite,  the  solution  of  (2.1)  is  the 
step  to  the  minimum  of  the  local  quadratic  approximation  to  F  at 


minimjse  gip  +  -  p^Hkp. 


The  m^or  distinctions  among  algorithms  involve  the  definition  of  Hk. 

When  Hk  ii  the  exact  Hessian  at  x*  or  a  finite-difference  approximation,  the  algorithm 
based  on  solving  (2.1)  for  p*  is  called  a  Newton-type  method.  Newtonrtype  methods  tend  to  be 
powerful  and  robust  when  properly  implemented,  and  exhibit  quadratic  convergence  under  mild 
conditions.  However,  certain  difficulties  arise  when  Hk  is  indefinite,  since  the  quadratic  ftmetion 
(2.2)  is  unbounded  below  and  the  solution  of  (2.1)  may  be  undefined.  Numerous  strategies  have 
been  suggested  for  this  ease,  and  often  involve  defining  pk  as  the  solution  of  a  linear  system 
with  a  positive-definite  matrix  that  is  closely  related  to  the  Hessian.  These  techniques  include 
the  modified  Cholesky  factorisation  of  Gill  and  Murray  (1974)  and  various  trust-region  strate^es 
(see,  e.g.,  Mor4  and  Sorensen,  1982). 

When  an  exact  or  finite-difference  Hessian  is  unavailable  or  too  expensive,  a  popular  alterna¬ 
tive  is  to  use  a  quasi-Newton  method  (see  Dennis  and  Mor4, 1977,  for  a  survey).  In  a  quasi-Newton 
method,  the  matrix  Hk  is  an  approximation  to  the  Hessian  that  is  updated  ty  a  low-rank  change 
at  each  iteration,  baaed  on  information  about  the  change  in  the  gradient.  The  hope  is  that  the 
approximation  will  improve  as  the  iterations  proceed.  Quasi-Newton  methods  ^ically  display  a 
superlinear  rate  of  convergence  in  practice,  and  are  often  more  efficient  (in  terms  ot  computation 
time)  than  Nowtom>type  methods. 
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When  n  becomes  rerj  large,  tsfO  related  difficulties  can  occur  with  methods  that  soItb  (2.1) 
directly:  excessiTe  computation  time  and  insufficient  storage  for  the  nx  n  matrix  A*.  Fortunately, 
the  Hessian  matrices  of  many  large  unconstrained  problems  are  quite  sparse,  and  density  tends 
to  decrease  as  n  increases.  Large  problems  can  thus  be  solred  efficiently  using  techniques  that 
exploit  sparsity  in  to  save  work  and/or  storage,  or  that  do  not  require  storage  of 


2.2.  NewtoiHtype  methods.  When  the  Hessian  is  sparse  and  can  be  computed  analytically, 
a  Newton-type  method  can  be  implemented  by  applying  standard  sparse  procedures  to  solye 
HkPk  =  —9k-  lit  particular,  when  Hk  is  positive  definite,  any  efficient  technique  for  computing  a 
sparse  Cholesky  factorisation  may  be  applied  in  this  context  (for  a  survey  of  available  software,  see 
Duff,  1982).  Although  many  linear  ^rgtems  may  need  to  be  solved  before  the  method  converges, 
all  of  them  have  the  same  sparsity  pattern,  and  hence  the  structure  needs  to  be  analyzed  only 
once. 

Indefiniteness  in  a  sparse  Hessian  may  be  treated  using  the  procedures  mentioned  for  the 
dense  case.  The  modified  Cholesl^  factorisation  (Gill  and  Murray,  1974)  has  been  adapted  in 
a  straightforward  fashion  to  treat  sparsi^  (see  Thapa,  1980).  One  SMlvantage  of  the  modified 
Cholesky  approach  is  that  indefiniteness  can  be  detected  and  corrected  while  constructing  the 
factorization  of  the  positive-definite  matrix  to  be  used  in  computing  pk]  hence,  only  one  sparse 
factorisation  needs  to  be  computed  at  each  iteration.  With  trust-region  methods,  pk  may  be 
obtuned  using  off-the-shelf  software  for  a  sparse  Cholesl^  factorisation;  however,  these  methods 
^ically  require  more  than  one  factorisation  per  iteration. 

When  the  gradient  is  available,  but  the  exact  Hessian  is  not,  a  finite-difliBrence  approximation 
to  the  Hessian  may  be  used  m  Hk-  lu  the  general  case,  this  requires  n  gradient  evaluations. 
However,  if  the  sparsity  pattern  of  the  Hessian  is  known  a  priori,  it  is  possible  to  choose  special 
vectors  that  allow  a  finite-difference  approximation  to  H(x)  to  be  computed  with  many  fewer 
than  n  evaluations  of  the  gradient. 

For  example,  suppose  that  H{z)  is  tridiagonal: 

XX  \ 

XXX 
XXX 
XXX 

XXX 
XXX 
X  x> 


/ 

H(x)  = 

c 


Conrider  the  vectors  ^ 

where  Xi  =  (1, 0, 1,0,..  .)^,  =  (0, 1, 0, 1, . .  .)^,  and  h  is  an  appropriate  finite-difference  interval. 
Let  denote  t^  s-th  component  of  pt,  and  similarly  for  The  vectors  pi  and  are 
approximations  to  the  sums  of  odd  and  even  columns  of  Hk,  respectively.  Therefore, 


aw 


8*F 

^•^^8x18x2' 


8^F  a*F 
8x1 8x2  8x2  8x2  * 


and  so  on. 


8*F 

8x2  8x2 


Thus,  for  example, 


Vl.2  —  92,1  w 
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bi  thii  fMhkm,  all  tlw  atemanta  cxf  Hk  can  ba  approodmated  with  only  two  aralnationi  of  tha 
gradtent,  ngardleu  of  the  value  o/n. 

Tha  idea  of  analTiing  tha  i panitj  pattarn  of  tha  Hawian  in  order  to  determine  f nitable  finita- 
diflSBrar'w  vaetors  has  bean  tha  subject  of  much  recant  interest.  An  algorithm  for  finding  suitable 
finita>dilhranea  vaetors  for  a  sparse  (unsTmmatrie)  matrix  is  given  by  Curtis,  Powell  and  Reid 
(1974),  and  is  based  on  grouping  together  columns  in  which  there  are  no  overlapping  elements. 
The  problem  of  finding  a  minimal  set  of  vectors  can  be  viewed  as  a  graph  coloring  problem  in 
the  directed  graph  that  represents  the  sparsity  pattern.  A  proof  that  finding  the  minimal  set  is 
NP-hard  is  given  in  Coleman  and  Mor4  (1981),  along  with  practical  algorithms  (see  also  Coleman 
and  Mor4, 1982a). 

A  similar  relationship  with  graph  coloring  can  be  developed  for  the  case  of  a  symmetric 
matrix;  imposing  the  requirement  of  symmetry  on  the  sparse  matrix  transforms  the  associated 
graph  into  a  undirected  graph.  Although  the  problem  of  finding  a  minimal  set  is  NP-complete  (see 
McCormick,  1981),  effective  algorithms  have  been  developed  based  on  graph-theoretic  heuristics. 
The  algorithms  are  based  on  principles  similar  to  those  for  the  unsymmetric  case,  but  are 
considerably  complicated  by  exploiting  symmetry. 

A  finite-difference  Newton-type  method  for  sparse  problems  thiu  begins  with  a  procedure  that 
analyses  the  sparsity  pattern  in  order  to  determine  suitable  finite-diffsrence  vectors.  Algorithms 
for  finding  these  vectors  have  been  ^ven  by  Powell  and  Toint  (1979)  and  Coleman  and  M(n4 
(1982b).  Once  a  sparse  finite-difference  Hessian  approximation  has  been  computed,  a  sparse 
factorisation  can  be  computed  as  with  the  exact  Hessian. 


2^  Sparse  quasl-Newton  methods.  Because  of  the  great  success  of  quasi-Newton  methods  on 
dense  inoblems,  it  is  natural  to  consider  how  such  methods  might  be  extended  to  take  advantage^ 
of  sparsity  in  the  Hessian.  This  extension  was  suggested  first  for  the  case  of  sparse  nonlinear 
equations  bj  Schubert  (1970),  and  was  analysed  by  Marwil  (1978).  Discussions  of  sparse  quasi- 
Newton  methods  for  optimisation  and  nonlinear  equations  are  given  in  Toint  (1977),  Dennis  and 
Schnabel  (1979),  Toint  (1979),  Shanno  (1980),  Steihaug  (1980),  Thapa  (1980),  Powell  (1981), 
Dennis  and  Marwil  (1982)  and  Sorensen  (1982).  In  the  remainder  of  this  section  we  give  a  brief 
description  of  sparse  quasi-Newton  methods  applied  to  unconstrained  optimisation. 

In  quasi-Newton  methods  for  dense  problems,  the  Hessian  approximation  H/t  is  updated  at 
each  iteration  by  the  relationship 

The  update  matrices  associated  with  many  dense  quasi-Newton  methods  are  of  rank  two, 
and  can  be  shown  to  be  the  minimum-norm  symmetric  change  in  Hk,  subject  to  satisfying  the 
quasi-Newton  condition 

=  Vk,  (2.3) 

where  —  Sk.fi  —  •od  Pk  =  fk+i  —  fk  (*«*,  e.g.,  Dennis  and  Mori,  1977).  By  suitable 
choice  of  the  steplength  Ok  in  (1.1),  the  property  of  hereditary  positive-definiteneu  can  also  be 
maintained  (i.e.,  Nk+i  is  positive  definite  if  Hm  is).  However,  the  update  matrices  Uk  do  not 
retain  the  sparsity  pattern  of  the  Hessian. 

The  initial  approach  to  developing  sparse  quasi-Newton  updates  was  to  impose  the  additional 
constraint  of  retaining  sparsity  on  the  norm-minimisation  problem  (Powell,  1976;  Toint,  1977). 
Let  M  be  defined  as  the  set  of  indices  {(<,/)  |  H^(s)  =  0},  so  that  M  represents  the  specified 
sparsity  pattern  of  the  Hessian,  and  assume  that  Hk  bas  the  same  sparsity  pattern.  A  sparse 
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update  matrix  l/j^  it  then  the  solution  of 

minute  ||Cf|| 

subject  to  (Hk  +  U)»k  =  Vk 

U=U^, 

Uii  =  0  for  {i,3)eM. 


(2.4) 


Let  denote  the  xector  $k  with  the  sparsity  pattern  of  the  j-th  column  of  Hk  imposed. 
When  the  norm  in  (2.4)  it  the  Frobeniut  norm,  the  solution  it  given  by 


=  53  (2-5) 

y-i 

where  ey  is  the  j-th  unit  vector  and  X  is  the  vector  of  Lagrange  multipliers  associated  with  the 
subproblem  (2.4).  The  vector  X  is  the  solution  of  the  linear  system 


^X  =  p*  —  Hk»k, 


(2.6) 


where 

The  matrix  Q  is  symmetric  and  has  the  same  sparsity  pattern  wHkiQ  is  positive  definite  if  and 
only  if  |lo(^)||  >  0  for  all  j.  (The  sparse  analogue  of  any  quasi-Newton  formula  may  be  obtained 
using  a  similar  analysis;  tee  Shanno,  1979,  and  Thapa,  1980). 

Thus  far,  sparse  quasi-Newton  methods  have  not  enjqyed  the  great  success  of  their  dense 
counterparts.  First,  there  are  certun  complications  that  result  from  the  requirement  of  sparsity. 
In  particular,  note  that  the  update  matrix  Uk  (2.5)  is  of  rank  n,  rather  than  of  rank  two;  this 
means  that  the  new  approximate  Hessian  cannot  be  obtained  by  a  simple  update  of  the  previous 
appro^mation.  Second,  an  additional  sparse  linear  system  (2.6)  must  be  solved  in  order  to 
compute  the  update.  Finally,  it  is  not  possible  in  general  to  achieve  the  property  of  hereditary 
positive-definiteness  in  the  matrices  {Hk}  if  the  quasi-Newton  condition  is  satisfied  (see  Toint, 
1979,  and  Sorensen,  1982);  in  fact,  positive-definiteness  may  not  be  retained  even  if  Hk  is  taken 
as  the  exact  (positive  definite)  Hessian  and  the  initial  Zk  is  very  close  to  the  solution  (see  Thapa, 
1980). 

In  addition  to  these  theoretical  difllculttcs,  computational  results  have  tended  to  indicate  that 
currently  available  sparse  quasi-Newton  methods  are  less  effective  than  alternative  methods  (in 
terms  of  the  number  of  function  evaluations  required  for  convergence).  However,  hope  remains 
that  their  eflleiency  may  be  improved  —  for  example,  by  relaxing  the  quasi-Newton  condition 
(2.3),  or  Ity  finding  only  an  approximate  solution  of  (2.6)  (Steihaug,  1982).  For  a  discussion  of 
some  possible  new  approaches,  see  Sorensen  (1982). 

3.4.  Co^lugsit^gradlent  methods.  The  term  coq/ugato-gradient  refers  to  a  class  of  optimisation 
algorithms  that  generate  directions  of  search  without  storing  a  matrix.  They  are  essential  in 
circumstances  when  methods  based  on  matrix  factorisation  are  not  viable  because  the  relevant 
matrix  is  too  large  or  too  dense.  We  emphasise  that  there  are  two  types  of  coqjugate-gradient 
method  —  linear  and  nonlinear. 
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The  Hnear  conjugate-gracUeiit  method  wai  originally  deriyed  as  an  iterative  procedure  for 
lolTing  positiTe-definite  symmetric  systems  of  linear  equations  (Hestenes  and  Stiefel,  1952).  It 
has  been  studied  and  analysed  by  many  authors  (see,  e.g.,  Reid,  1971).  When  applied  to  the 
positiTe-definite  symmetric  linear  system 


Hx  =  -c,  (2.7) 

it  computes  a  sequence  of  iterates  using  the  relation  (1.1).  The  Tector  Pk  is  defined  by 

Pfc  =  —{Hxk  +  e)  +  Pk—xPk—x,  (2-8) 

and  the  step  length  a*  is  given  by  an  explicit  formula.  The  matrix  H  need  not  be  stored  explicitly, 
since  it  appears  only  in  matrix-vector  products. 

With  exact  arithmetic,  the  linear  conjugate-gradient  algorithm  will  compute  the  solution  of 
(2.7)  in  at  most  m  (m  <  n)  iterations,  where  m  is  the  number  of  distinct  eigenvalues  of  H. 
Therefore,  the  number  of  iterations  required  should  be  significantly  reduced  if  the  original  system 
can  be  replaced  by  an  equivalent  system  in  which  the  matrix  has  clustered  eigenvalues.  The  idea 
of  preconditioning  is  to  construct  a  transformation  to  have  this  effect  on  H.  One  of  the  earliest 
references  to  preconditioning  for  linear  equations  is  Axelsson  (1974).  See  Concus,  Golub  and 
O’Leary  (1976)  for  details  of  various  preconditioning  methods  derived  from  a  slightly  different 
viewpoint. 

The  nonlinear  conjugate-gradient  method  is  used  to  minimize  a  nonlinear  function  without 
storage  of  any  matrices,  and  was  first  proposed  by  Fletcher  and  Reeves  (1964).  In  the  Fletcher- 
Reeves  algorithm,  pk  is  defined  as  in  the  linear  case  by  (2.8),  where  the  term  Hxk  +  c  is  replaced 
by  gk,  the  gradient  at  x*.  For  a  nonlinear  function,  in  (1.1)  must  be  computed  by  an  iterative 
step-length  procedure.  When  the  initial  vector  po  is  taken  as  the  negative  gradient  and  ak  is  the 
step  to  the  minimum  of  F  along  pk,  it  can  be  shown  that  each  pk  is  a  direction  of  descent  for  F. 

Many  variations  and  generalisations  of  the  nonlinear  conjugate-gradient  method  have  been 
proposed.  The  most  notable  features  of  these  methods  are:  Pk  is  computed  using  different 
definitions;  pk  is  defined  as  a  linear  combination  of  several  previous  search  directions;  po  is  not 
always  chosen  as  the  negative  gradient;  and  is  computed  with  a  relaxed  linear  search  (i.e.,  ajt 
is  not  necessarily  a  close  approximation  to  the  step  to  the  minimtun  of  F  along  pk).  Furthermore, 
the  idea  of  preconditioning  may  be  extended  to  nonlinear  problems  by  allowing  a  preconditioning 
matrix  that  varies  from  iteration  to  iteration. 

It  is  well  known  that  rounding  errors  may  cause  even  the  linear  coqjugate-gradient  method  to 
converge  very  slowly.  The  nonlinear  conjugate-gradient  method  displays  a  range  of  performance 
that  has  not  yet  been  adequately  explained.  On  problems  in  which  the  Hessian  at  the  solution 
has  clustered  eigenvalues,  a  nonlinear  conjugate-gradient  method  will  sometimes  converge  more 
quickly  than  a  quasi-Newton  method,  whereas  on  other  problems  the  method  will  break  down,  i.e. 
generate  search  directions  that  lead  to  essentially  no  progress.  For  recent  surveys  of  conjugate- 
gradient  methods,  see  Gill  and  Murr^  (1979),  Fletcher  (1980)  and  Hestenes  (1980). 

2.5.  The  truncated  Hnear  eoqfugate-gradient  method.  Much  recent  interest  has  been  focussed  on 
an  approach  to  unconstridned  optimisation  in  which  the  equations  (2.1)  that  define  the  search 
direction  are  ‘solved”  (approximately)  by  performing  a  limited  number  of  iterations  of  the  linear 
conjugate-gradient  method. 

Consider  the  case  in  which  the  exact  Hessian  is  used  in  (2.1).  Dembo,  Eisenstat  and  Steihaug 
(1982)  note  that  the  local  convergence  properties  of  Newton’s  method  depend  on  pk  being  an 
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accurate  solution  of  (2.1)  only  near  tlie  solution  of  the  unconstrained  problem.  Th^  present 
a  criterion  that  defines  the  level  of  accuracy  in  pk  necessary  to  achieve  (piadratic  convergence 
as  the  solution  is  approached,  and  suggest  systematically  ‘truncating”  the  sequence  of  linear 
conjugate-gradient  iterates  when  solving  the  linear  system  (2.1)  (hence  their  name  of  truncated 
Newton  method”).  (See  also  Dembo  and  Steihaug,  1980,  and  Steihaug,  1980.) 

This  idea  has  subsequently  been  applied  in  a  variety  of  situations  —  for  example,  in  computing 
a  search  direction  from  (2.1)  when  Hk  is  a  sparse  quasi-Newton  approximation  (Steihaug,  1982). 
We  therefore  prefer  the  more  specific  name  of  truncated  conjugate-gradient  methods.  These 
methods  are  useful  in  computing  search  directions  when  it  is  impractical  to  store  Hu,  but  it 
is  feasible  to  compute  a  relatively  small  number  of  matrix- vector  products  involving  H^.  For 
example,  this  would  occur  if  H/t  were  the  product  of  several  sparse  matrices  whose  product  is 
dense  (see  Section  3.3.1).  Truncated  conjugate-gradient  methods  have  also  been  used  when  the 
matrix- vector  product  Hjtv  is  approximated  (say,  by  a  finite-difference  along  v);  in  this  case, 
the  computation  of  Pk  requires  a  number  of  gradient  evaluations  equal  to  the  number  of  linear 
conjugate-gradient  iterations  (see,  e.g.,  O’Leary,  1982).  In  order  for  these  methods  to  be  effective, 
it  must  be  possible  to  compute  a  good  solution  of  (2.1)  in  a  small  number  of  linear  conjugate- 
gradient  iterations,  and  hence  the  use  of  preconditioning  is  important. 

With  a  truncated  conjugate-gradient  method,  complications  arise  when  the  matrix  Hk  is  not 
positive  definite,  since  the  linear  conjugate-gradient  method  is  likely  to  bresik  down.  Various 
strategies  are  possible  to  ensure  that  pk  is  still  a  well  defined  descent  direction  even  in  the 
indefinite  case.  For  example,  the  conjugate-gradient  iterates  may  be  computed  using  the  Lanczos 
process  (Paige  and  Saunders,  1975);  a  Cholesky  factorization  of  the  resulting  tridiagonal  matrix 
leads  to  an  algorithm  that  is  equivalent  to  the  usual  iteration  in  the  positive-definite  case.  If  the 
tridiagonal  matrix  is  indefinite,  a  related  positive-definite  matrix  can  be  obtained  using  a  modified 
Cholesky  factorization.  Furthermore,  preconditioning  can  be  included,  in  which  case  the  linear 
conjugate-gradient  iterates  begin  with  the  negative  gradient  transformed  by  the  preconditioning 
matrix.  If  the  preconditioning  matrix  is  a  good  approximation  to  the  Hessian,  the  iterates  should 
converge  rapidly.  Procedures  of  this  type  are  described  in  O’Leary  (1982),  Nash  (1982),  and  Gill 
et  ai.  (1983). 

Further  fiexibility  remsdns  as  to  how  the  result  of  a  truncated  conjugate-gradient  procedure 
may  be  used  within  a  method  for  unconstrained  optimization.  Rather  thsm  simply  being  used  as  a 
search  direction,  for  example,  pk  may  be  combined  with  previous  search  directions  in  a  nonlinear 
conjugate-gradient  method  (see  Nash,  1982,  and  Gill  et  ai.,  1983). 

S.  Linaarly  ConstraiiMd  Optimiiation 

3.1.  Introduction.  The  linearly  constrained  problem  will  be  formulated  as 


LCP  minimise  F(x) 

*€«» 

subject  to  Ax  =:b 
I  <  X  <u 


where  the  m  X  n  matrix  A  is  assumed  to  be  large  and  sparse.  For  simplici^,  we  assume  that  the 
rows  of  /  are  linearly  independent  (if  not,  some  of  them  may  be  removed  without  altering  the 
solution). 

The  most  popular  methods  for  linear*  <’''nstrr'  optimisation  are  active-set  methods,  in 
which  a  subset  of  the  constraints  (the  work*  ^  set;  ^s  used  to  define  the  search  direction.  The 


Sparfe  Matrix  hietbodt  in  Optimisation 


iforldng  set  at  xa  usually  includes  constraints  that  are  satisfied  exactly  at  xa;  the  search  direction 
is  then  computed  so  that  movement  along  pa  ^1  continue  to  satisfy  the  constraints  in  the  working 
set. 

Vi^th  problem  LCP,  the  working  set  will  include  the  general  constrsunts  Ax  =  b  and  some  of 
the  bounds.  When  a  bound  is  in  the  working  set,  the  corresponding  variable  is  fixed  during  that 
iteration.  Thus,  the  working  set  induces  a  partition  of  x  into  fixed  and  free  variables. 

We  shall  not  be  concerned  with  details  of  how  the  working  set  is  altered,  but  merely  emphasise 
that  the  fixed  variables  at  a  given  iteration  are  effectively  removed  from  the  problem;  the 
corresponding  components  of  the  search  direction  will  be  zero,  and  thus  the  columns  of  A 
corresponding  to  fixed  variables  may  be  ignored.  Let  Ai^  denote  the  submatrix  of  A  corresponding 
to  the  free  variables  at  iteration  k]  each  change  in  the  working  set  corresponds  to  a  change  in  the 
columns  of  Ak.  Let  tty  denote  the  number  of  free  variables,  and  the  vector  pk  denote  the  search 
direction  with  respect  to  the  free  variables  only. 

By  analogy  with  (2.2)  in  the  unconstrained  case,  we  may  choose  pk  as  the  step  to  the  minimum 
of  a  quadratic  approximation  to  F,  subject  to  the  requirement  of  remaining  on  the  constraints  in 
the  working  set.  This  gives  pk  as  the  solution  of  the  following  quadratic  program: 

minimize  yfp  +  ^  p^HkP 
subject  to  AkP  =  0, 

where  gk  denotes  the  gradient  and  Hk  the  Hessian  (or  Hessian  approximation)  at  x^  with  respect 
to  the  free  variables. 

The  solution  pk  and  Lagrange  multiplier  Xjt  of  the  problem  (3.1)  satisfy  the  ny  equations 


(X- 


which  will  b^  called  the  augmented  system. 

One  convenient  way  to  represent  pk  involves  a  matrix  whose  columns  form  a  basis  for  the 
null  space  of  Ak.  Such  a  matrix,  which  will  be  denoted  by  Zk,  has  riy  —m  linearly  independent 
columns  and  satisfies  AkZk  =  0.  The  solution  of  (3.1)  may  then  be  computed  by  solving  the 
null-space  equations 

^k^k^kPw  —  ~^k9k  (3-3) 

and  setting 

Pk  =  ZkPM-  (3.4) 

Equations  (3.3)  and  (3.4)  define  a  null-space  representation  of  pk  (so  named  because  it  explicitly 
involves  Zk).  The  vector  Z^gk  and  the  matrix  ZkHk^k  called  the  projected  gradient  and 
projected  Hessian. 


3.2.  Reiwesentatlon  of  the  nuU  space.  The  issues  that  arise  in  representing  Zk  when  Ak  is  sparse 
illustrate  the  need  to  compromise  strategies  that  are  standard  for  dense  problems.  In  the  rest  of 
this  section,  we  shall  drop  the  subscript  k  associated  with  the  iteration. 

In  dense  problems,  it  is  customary  to  use  an  explicit  LQ  or  some  other  orthogonal  factorisation 
of  A  in  order  to  define  Z.  If  AQ  =  (  L  0  ),  where  the  orthonormal  matrix  Q  is  partitioned 
as  (  y  Z  )  and  L  is  lower  trian^ar,  then  AZ  =  0.  In  this  case,  Z  has  the  "ideal”  property 
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that  its  columns  are  orthonormal,  so  that  formation  of  the  projected  Hessian  and  gradient  does 
not  exacerbate  the  condition  of  (3.3)  and  (3.4).  Unfortunately,  for  large  problems  computation  of 
such  a  factorization  is  normally  too  expensive.  (Some  current  research  is  concerned  with  efficient 
methods  for  obtaining  sparse  orthogonal  factorizations;  see  George  and  Heath,  1981.  However, 
the  need  to  update  the  factors  is  an  even  more  serious  difficulty;  see  Heath,  1982,  and  George 
and  Ng,  1982.) 

If  an  orthogonal  factorization  is  unacceptable,  a  good  alternative  is  to  reduce  A  to  triangular 
form  using  Gaussian  elimination  (i.e.,  elementary  transformations  combined  with  row  and  column 
interchanges).  This  would  give  an  LU  factorization  in  the  form 

=  0),  (3.5) 

where  Pi  and  P2  are  permutation  matrices,  U  is  unit  upper  triangular,  and  L  is  lower  triangular. 
The  matrices  Pi  and  P2  would  be  chosen  to  make  U  well  conditioned  and  ||W^||  reasonably  small. 
The  required  matrix 

^ T ) 

would  no  longer  have  orthonormal  columns,  but  should  be  quite  well  conditioned,  even  if  A  if 
poorly  conditioned. 

Unfortunately,  it  is  not  known  how  to  update  the  factorization  (3.5)  efficiently  in  the  sparse 
case  when  columns  of  A  are  altered.  However,  (3.5)  indicates  the  existence  of  a  square,  nonsingular 
submatrix  drawn  from  the  rows  and  columns  of  A.  We  shall  assume  for  simplicity  that  this  matrix 
comprises  the  left-most  columns  of  A,  i.e. 


A  =  iB  S),  (3.7) 

where  B  is  non-singular.  (In  practice,  the  columns  of  B  may  occur  anywhere  in  A.)  It  follows 
from  (3.7)  and  (3.5)  (with  Pi  and  P2  taken  as  identity  matrices)  that  BW  S  =  0,  so  that 
W  =  —B~^S.  Thus,  Z  has  the  form 


2  =  (  *'*)■  P*) 

As  long  as  B  in  (3.7)  is  nonsingular,  the  matrix  Z  (3.8)  will  provide  a  basis  for  the  null  space  of 
A.  In  the  absence  of  the  ideal  factorization  (3.5),  the  aim  must  be  to  choose  a  B  that  is  as  well 
conditioned  as  conveniently  possible,  since  this  will  tend  to  limit  the  size  of  ||W||  and  hence  the 
condition  of  Z. 

The  partition  of  the  columns  of  A  given  by  (3.7)  induces  a  partition  of  the  free  variables, 
which  will  be  indicated  by  the  subscripts  and  *5”.  The  m  variables  are  called  the  basic 
variables.  The  remaining  s  free  variables  (s  =  —  m)  are  called  the  snperbasic  variables.  For 

historical  reasons,  the  fixed  variables  are  sometimes  called  the  nonbasic  variables. 

An  advantage  of  the  form  (3.8)  for  sparse  problems  is  that  operations  with  Z  and  Z^  may 
be  performed  using  a  factorization  of  the  matrix  B;  the  matrix  Z  itself  need  not  be  stored.  For 
example,  the  vector  Z^g  required  in  (3.3)  may  be  written  as 


(3.») 


to 


Spane  Matrix  M«t&oda  in  Optinnaation 


(Tbo  doctor  on  the  rifht-hand  tide  of  (3.9)  it  called  the  reduced  gradient;  note  that  it  ii  simply 
the  projected  gradient  with  a  particular  form  of  Z.)  Thus,  Z^g  may  be  obtained  by  soMng 
—  fa,  and  than  forming  f#  —  5^v.  Similarly,  to  form  p  =  Zp,,  we  haye 

'-(-r 


iHikh  giyet  the  system 

Bpm  —  — Spg. 

Ifi^th  the  reduced-gndient  form  of  Z  (3.8),  the  problems  of  representing  a  null  space  and 
computing  the  associated  projections  reduce  to  the  familiar  operations  of  factorising  and  solying 
with  an  appropriate  square  B. 


Ut,  Sohrlng  for  the  search  direction.  At  each  iteration  of  an  actiye-set  method  for  LCP,  the 
search  direction  p  with  respect  to  the  free  variables  solves  the  subproblem  (3.1).  We  have  seen 
that  there  are  mathematically  equivalent  representations  of  p;  the  way  in  which  p  is  computed 
for  sparse  problems  depends  on  several  considerations,  which  will  be  discussed  below. 

3.3.1.  Solving  the  mdl*spaee  equations.  For  sparse  problems,  it  will  generally  not  be  possible  to 
solve  (3.3)  by  explicitly  forming  and  then  factorising  Z^HZ.  Even  if  H  and  B  are  sparse,  the 
projected  Hessian  will  generally  be  dense.  Thus,  if  a  factorization  of  the  projected  Hessian  is  to 
be  stored,  the  number  of  superbasic  variables  at  each  iteration  must  be  sufficiently  small  (i.e.,  the 
number  of  fixed  variables  must  be  sufficiently  large).  Fortunately,  for  many  large-scale  problems 
there  is  an  a  priori  upper  bound  on  the  number  of  free  variables.  For  example,  if  only  q  of  the 
variables  appear  nonlinearly  in  the  objective  function,  the  dimension  of  the  projected  Hessian 
matrix  at  the  solution  cannot  exceed  q. 

Furthermore,  even  if  the  dimension  of  Z^HZ  is  small,  forming  the  projected  Hessian  m^ 
involve  a  substantial  amount  of  work;  when  Z  is  defined  by  (3.8),  computation  of  Z^HZ  requires 
the  solution  of  2s  systems  of  size  m  X  m.  For  this  reason,  a  Newton-type  method  in  which  the 
projected  Hessian  is  recomputed  at  each  iteration  is  not  generally  practical.  By  contrast,  quasi- 
Newton  methods  can  be  adapted  very  effectively  to  sparse  problems  in  which  the  dimension  of  the 
projected  Hessian  remains  small,  by  updating  a  dense  Cholesky  factorization  of  a  quasi-Newton 
approximation  to  the  projected  Hessian;  this  is  the  method  used  in  the  MINOS  code  of  Murtagh 
and  Saunders  (1977, 1980). 

When  the  projected  Hessian  cannot  be  formed  or  factorized,  the  null-space  equations  may 
be  solved  using  an  iterative  method  that  does  not  require  storage  of  the  the  matrix,  such  as  a 
truncated  conjugate-gradient  method  (see  Section  2.5).  In  order  for  this  approach  to  be  reasonable, 
the  computation  of  matrix-vector  products  involving  Z  and  H  must  be  relatively  cheap  (e.g,  when 
H  is  sparse);  in  addition,  a  good  approximation  to  the  solution  of  (3.3)  must  be  obtuned  in  a  small 
number  of  iterations.  Even  when  the  Hessian  is  not  available,  a  truncated  conjugate-gradient 
method  mqr  be  applied  to  (3.3)  by  using  a  finite-difference  of  the  gradient  to  approximate  the 
vector  HZv;  an  evaluation  of  the  gradient  is  thus  necessary  for  every  iteration  of  the  truncated 
coiyugatt-gradient  method.  Note  that  this  is  one  of  the  few  methods  in  which  H  is  not  required 
to  sparse.  Some  experience  with  a  truncated  conjugate-gradient  approach  in  this  context  is 
described  in  Gill,  et  ai.,  1983. 

Each  of  the  above  methods  for  solving  the  null-space  equations  can  be  adapted  to  allow  for 
changes  in  the  working  set  (Section  3.5). 
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S.S.S.  Solving  the  range>ipMe  eqnatloni.  The  null-space  equations  provide  one  means  of  solv¬ 
ing  for  p  in  the  augmented  system  (3.2),  by  eliminating  When  H  is  positive  definite,  a 
complementary  approach  is  to  solve  for  X  first,  via  the  rutge-Bpace  equations 

AH"‘A^X  =  AH-^g, 

Hp  =  A^\  —  g. 

This  method  vrould  be  appropriate  if  H  were  sparse,  and  if  A  had  relatively  few  rows.  The 
application  of  a  range-space  approach  to  quadratic  programming  is  discussed  by  Gill  et  a/.  (1982). 

S.S.3.  Solving  the  augmented  system.  An  alternative  method  for  obtaining  p  involves  treating 
the  augmented  system  directly.  (Variations  of  this  idea  have  been  proposed  by  numerous  authors; 
see,  e.g.,  Bartels,  Golub  and  Saunders,  1970).  The  most  obvious  way  to  solve  (3.2)  is  to  apply  a 
method  for  symmetric  indefinite  systems,  such  as  the  Harwell  code  MA27  (Duff  and  Reid,  1982). 
In  order  for  the  solution  of  (3.2)  to  be  meaningful,  the  matrix  Z^HZ  must  be  positive  definite. 
Verifying  positive-definiteness  in  this  situation  is  a  nontrivial  task,  since  of  course  the  matrix 
Z^HZ  is  not  computed  explicitly.  However,  the  result  may  sometimes  be  known  a  priori  —  for 
example,  when  H  itself  is  positive-definite. 

Both  H  and  A  change  dimension  when  the  working  set  is  altered.  Updating  procedures  for 
this  case  are  discussed  in  Section  3.6.2. 

3.4.  Factorising  and  solving  a  square  system.  The  linear  systems  involving  B  and  are  typically 
solved  today  using  a  sparse  LU  factorisation  of  B.  Surveys  of  techniques  for  computing  such 
a  factorisation  are  given  in  Duff  (1982)  and  Duff  and  Reid  (1983).  The  analyse  phase  of  a 
factorisation  consists  of  an  analysis  of  the  sparsity  pattern  alone  (independent  of  the  values  of 
the  elements),  and  leads  to  a  permutation  of  the  matrix  in  order  to  reduce  fill-in  during  the 
factorisation.  The  factor  phase  consists  of  computation  with  the  actual  numerical  elements  of 
the  matrix. 

We  shall  mention  a  few  features  of  certun  factorisation  methods  that  have  particular  relevance 
to  optimisation  (see  Duff  and  Reid,  1983,  for  more  details).  Since  active-set  algorithms  include 
a  sequence  of  matrices  that  undergo  column  changes,  the  factorisation  methods  were  typicstlly 
developed  to  be  used  in  conjunction  with  an  update  procedure. 

The  P*  algorithm  of  Hellerman  and  Rarick  (1971, 1972)  performs  the  analyse  phase  separately 
from  the  factor  phase,  and  produces  the  well  known  “bump  and  spike”  structure,  in  which 
B  is  permuted  to  block  lower-triangular  form  with  relatively  few  “spikes”  (columns  containing 
nonseros  above  the  diagonal).  This  procedure  is  very  effective  if  B  is  nearly  triangular.  Also,  the 
factor  phase  is  able  to  use  external  storage,  since  it  processes  B  one  column  at  a  time.  Column 
interchanges  are  used  to  stabilise  the  factorisation.  (Row  interchanges  would  destroy  the  sparsity 
pattern.)  If  an  interchange  is  needed  at  the  s-th  stage,  it  is  necessary  to  solve  a  system  of  the 
form  LjLiP  =  and  to  compute  the  quantities  y^aj  for  all  remaining  eligible  spike  columns  ay. 
This  involves  significant  work  and  also  degrades  the  sparsity  of  the  factors.  Thus,  a  rather  loose 
pivot  tolerance  must  be  used  to  avoid  many  column  interchanges  (e.g.,  |/i|  <  10^,  where  p  is  the 
largest  subdiagonal  element  in  any  column  of  L  divided  by  the  corresponding  diagonal). 

The  Markowits  algorithm  (Markowiti,  1957),  on  the  other  hand,  performs  the  analyse  and 
factor  phases  simultaneously,  and  hence  must  run  in  main  memory.  It  computes  dynamic  “merit 
counts”  in  order  to  determine  the  row  and  column  permutations  to  preserve  sparsity  and  yet 
retain  numerical  stability.  The  Markowiti  procedure  can  achieve  a  good  sparse  factorisation  even 
with  a  rather  strict  pivot  tolerance  (e.g.,  ||i|  <  10). 


I 


Sp»M  Matrix  Mtiboda  in  Optuaiaatioo 


12 


Tabtol 

Summaiy  of  Problem  Cherecteriitiet 


Stair  1 

stairs 

StoIrS 

OPF  1 

OPF  2 

B  rows 

SS7 

T45 

1170 

1200 

8400 

B  aeatoret 

1500 

8600 

7100 

6000 

26000 

P*  blocks 

1 

5 

18 

1 

— 

P*  splkos 

66 

101 

157 

715 

— 

Tabl«2 

Number  of  Nonteroi  in  initial  LI/  faetoriiation  and  after  k  updates 


Stair  1 

StldrS 

stairs 

OPF  1 

OPF  8 

LoUo  wltb  P*  (MINOS) 

6400 

16800 

88000 

80400 

— 

LtVt  witb  Markowits  (LAOS) 

5400 

4700 

18500 

18800 

75000 

k 

SO 

50 

50 

SO 

40 

LjkVfc  with  LAOS 

7800 

6000 

17100 

15800 

88000 

In  order  to  indicate  how  these  factor  routines  perform  on  matrices  that  arise  in  optimisation, 
we  gire  results  on  fire  test  problems.  In  the  first  three  problems,  the  matrix  B  has  "staircase* 
structure  (see,  e.g.,  Fourer,  1982);  constraints  of  this  form  often  arise  in  the  modeling  of  <^amic 
systems,  in  which  a  set  of  actiTities  is  replicated  over  several  time  periods.  The  fourth  and  fifth 
problems  arise  from  the  optimal  power  flow  (OFF)  problem  (see,  e.g.,  Stott,  Alsac  and  Marinho, 
1980).  In  this  ease,  B  is  the  Jacobian  of  the  network  equations  of  the  power  system,  and  has  a 
symmetric  sparsity  pattern  (which  is  not  at  dl  triangular!)  Table  1  shows  some  of  the  relevant 
features  of  the  problems  described,  including  the  results  of  faetoriiation  with  the  P*  algorithm. 

The  number  of  nonseros  in  the  initial  LU  faetoriiation  of  B  is  shown  in  the  first  two  rows 
of  Ikble  2.  The  P*  algorithm  is  u  implemented  in  the  MINOS  code  of  Murtagh  and  Saunders 
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(1977,  1980);  the  Markowiti  procedure  is  the  Hanvell  code  LAOS  (Reid,  1976,  1982).  Note  that 
the  large  number  of  spikes  in  the  first  OFF  problem  is  bound  to  cause  difficulties  for  the  P* 
algorithm. 

8.5.  Coluinn  updates 

For  problems  of  the  form  LCP,  each  change  in  the  leorking  set  inTolTes  changing  the  status  of 
a  variable  from  fixed  to  f^ee  (or  vice  versa).  When  a  previously  fixed  variable  bMOmes  freed,  a 
column  of  J(  is  added  to  A;  this  poses  no  particular  difficulty,  since  the  nevr  column  can  simply 
be  appended  to  S.  When  a  free  variable  is  to  become  fixed,  a  column  of  A  must  be  deleted, 
and  complications  arise  if  the  column  is  in  B.  Since  the  number  of  columns  in  B  must  remain 
constant  (in  order  for  0  to  be  nonsingular),  it  is  necessary  to  replace  a  column  of  B  vrith  one  of 
the  columns  of  S. 

Assume  that  vre  are  given  an  initial  Bo,  vrhich  thereafter  undergoes  a  sequence  of  column 
replacements,  each  corresponding  to  one  of  the  free  variables  becoming  fixed  on  a  bound.  Let  Ik 
denote  the  index  of  the  column  to  be  replaced  at  the  k-th  step,  ak  denote  the  Ik-th.  column  of  B, 
Vk  denote  the  nevr  column,  and  denote  the  ij^-th  column  of  the  identity  matrix.  After  each 
replacement,  vre  have 

0fc  =  S*_iH-(t;*-a*)e,^.  (3.10) 

We  shall  consider  several  vrays  in  vrhich  ^sterns  of  equations  involving  Bk  can  be  solved  following 
a  sequence  of  such  changes. 

8.5.1.  The  product-form  update.  The  standard  updating  technique  used  in  all  early  sparse  LP 
codes  was  the  product-form  (PF)  update  (e.g.,  Dantsig  and  Orchard-Hays,  1954).  It  follows  from 
the  definition  of  Bk  that 

Bk  =  Bk—iTk, 


where 

Bk-iyk  =  Vk  and  T*  =  /  +  (y*  —  eje (3.11) 

Note  that  Tk  is  a  permuted  triangular  matrix  (with  only  one  nontrivial  column);  equivalently,  Tk 
is  a  rank-one  modification  of  the  identity  matrix.  The  matrix  Tk  can  be  represented  by  storing 
the  index  Ik  and  the  vector  yt- 
After  k  such  updates  we  have 


Bk  —  BoTiTa-  *  •  Tk.  (3.12) 

Given  a  procedure  to  solve  systems  of  equations  involving  Bo,  (3.12)  indicates  that  solving  BkV  = 
ft  is  equivalent  to  solving  the  k  -|-  1  linear  systems 

Bb»o  =  ft,  TiVi=^vo,  ...,  TkVk  =  Vk-i,  (3.13) 

where  the  systems  involving  Tj  are  ea^  to  solve.  As  k  increases,  the  solution  process  becomes 
progressively  more  protracted,  and  the  storage  required  to  store  the  updates  is  strictly  increaung. 
Therefore  it  becomes  worthwhile  to  compute  a  factorisation  of  Bk  from  scratch.  Most  current 
systems  use  an  initial  triangular  factorisation  Bo  =  LoUo  (see  Section  3.4),  and  recompute  the 
factorisation  after  k  updates  (typically  k  <  50). 

The  PF  update  has  two  important  advantages  for  sparse  problems.  First,  the  vectors  {gy} 
may  be  stored  in  a  single  sequential  file,  so  that  implementation  is  straightfoiward.  Second,  any 
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admuM  in  thn  mathodt  for  liiww  equatioiu  is  immediatelj  applicable  to  the  factorisation  of  Bo, 
since  the  update  does  not  alter  the  initial  factorisation.  Thus,  Bo  maj  be  represented  bj  a  *bladc 
bon”  procedure  for  solTing  equations  (inTolring  both  Bo  and  B^). 

Unfortunately,  the  PF  update  has  two  significant  deficiencies.  It  is  numerically  unreliable  if 
^  (since  7s  is  then  ill-conditioned),  and  the  growth  of  data  defining  the  updates 
is  significant^  greater  than  for  alternatiTe  schemes. 

S.S.2.  The  Bartei^Golnb  update.  The  instability  of  the  PF  update  was  first  made  prominent 
by  Bartels  and  Golub  (1969),  who  showed  as  an  alternatiTe  that  an  LU  factorisation  can  be 
updated  in  a  stable  manner  (see  also  Bartels,  Golub  and  Saunders,  1970;  Bartels,  1971).  GiTen 
an  initial  factorisation  Bo  =  LoUo,  the  updates  to  L  are  represented  in  product  form,  but  the 
sparse  triangular  matrix  U  is  stored  (and  updated)  expJieitly.  Thus,  instead  of  the  form  (3.12) 
we  hare 

B*  =  LoT,ra-  •  •  =  L»U*,  (3.14) 

where  each  7y  represents  an  update  whose  construction  will  be  discussed  below. 

At  the  h-th  step,  replacing  the  I^-th  column  of  Bk—i  gives 

Bk  =  Lk-tU, 

where  0  is  identical  to  Us— i  except  for  its  /^-th  column.  Since  Uk—i  is  stored  as  a  sparse  matrix, 
it  is  desirable  to  restore  U  to  upper-triangular  form  Uk  without  causing  substantial  fill-in.  To 
this  end,  let  P  denote  a  y  ^Jc  permutation  that  moves  the  Ik-th  row  and  column  of  U  to  the  end, 
and  shifts  the  intervening  rows  and  columns  forward.  We  then  have 


The  nonseros  in  the  bottom  row  of  P^P  may  be  eliminated  by  adding  multiples  of  the  other 
rows.  However,  it  follows  from  the  usual  error  analysis  of  Gaussian  elimination  (e.g.,  ^i^ldnson, 
1965)  that  this  procedure  will  not  be  numerically  stable  unless  the  sise  of  the  multiple  is  bounded 
in  some  way.  Hence,  we  must  allow  the  last  row  to  be  interchanged  with  some  other  row. 
Formally,  the  row  operations  are  stabilised  elementary  transformations  (\l^lldnson,  1965),  which 
are  constructed  from  2X2  matrices  of  the  form 


(Note  that  the  transformati<m  M  includes  a  row  interchange.)  Each  such  transformation  is 
represented  by  the  scalar  p,  and  is  unncessary  if  the  element  to  be  eliminated  is  already  sero. 
Numerical  stability  is  achiev^  hj  choosing  between  M  and  M  so  that  the  multiplier  p  is  bounded 
in  site  by  some  moderate  number  (e.g.,  \p\  <  1,  10  or  100).  The  matrices  {7y}  in  (3.14)  are 
constructed  fh>m  sequences  of  matrices  of  the  form  (3.15). 
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Unfortunately,  elimination  of  the  nonierot  it  "eatier  tud  than  done”  in  the  sparse  case.  Any 
transformation  of  type  M  amounts  to  a  form  of  fill-in,  since  the  location  of  nonseros  in  the 
interchanged  rows  is  unlikely  to  be  the  same.  A  complex  data  structure  is  therefore  needed  to 
update  Uk  without  losing  efficiency  during  subsequent  solves.  (Holding  individual  nonseros  in  a 
linked  list,  for  example,  would  not  be  acceptable  in  a  virtual-memory  environment.) 

The  implementation  of  the  BG  update  by  Saunders  (1976)  capitalises  on  the  *bump  and 
spilm”  structure  revealed  by  the  P*  procedure  (see  Section  3.4).  Each  triangular  factor  is  of  the 


and  fill-in  can  occur  only  within  F*.  II  Uo  contains  a  spilms,  the  dimension  of  Fk  will  be  at  most 
k.  Storing  Fk  as  a  dense  matrix  allows  the  BG  update  to  be  implemented  with  maximum 
stability  (|/i|  <  1  in  (3.15)),  and  the  approach  is  efficient  as  long  as  s  is  not  unduly  large  (say, 
a  <  100).  This  implementation  has  been  used  for  several  years  in  the  nonlinear  programming 
system  MINOS  (Murtagh  and  Saunders,  1977,  1980).  During  that  period,  the  number  of  spilms 
in  Uo  has  proved  to  be  favorably  smsdl  for  many  sparse  optimisation  models.  However,  two 
important  applicatioiu  are  now  known  to  give  unacceptably  large  numbers  of  spikes;  time-period 
models  (for  which  B  has  a  staircase  structure)  and  optimal  power-fiow  problems  (for  which  B  has 
a  symmetric  sparsity  pattern).  Some  statistics  for  these  problems  are  given  in  Table  1  (Section 
3.4). 

Another  implementation  of  the  BG  update  has  been  developed  by  Reid  (1976,  1982)  as  the 
Fortran  package  LAOS  in  the  Harwell  Subroutine  Library.  It  strikes  a  compromise  between  dense 
and  linked-list  storage  by  using  a  whole  row  or  column  of  Uk  as  the  ‘unit*  of  storage.  Thus,  the 
nonseros  in  any  one  row  of  Uk  are  held  in  contiguous  locations  of  memory,  as  are  the  corresponding 
column  indices,  and  an  ordered  list  points  to  the  beginning  of  each  row.  To  facilitate  searching,  a 
similar  data  structure  is  used  to  hold  just  the  sparsity  pattern  of  each  column  (i.e.,  the  row  indices 
are  stored,  but  not  the  nonseros  themselves).  This  storage  scheme  is  also  suitable  for  computing 
an  initial  LU  factorisation  using  the  Markowits  criterion  and  threshold  pivoting  —  a  combination 
that  has  been  eminently  successful  in  practice,  particularly  on  the  structures  mentioned  above. 
Table  2  (Section  3.4)  shows  the  small  increase  in  the  number  of  nonseros  using  LAOS. 

We  note  that  in  the  context  of  column  updating,  the  stability  test  in  the  initial  factorisation 
should  ideally  be  performed  along  the  columns  of  Lo,  rather  than  along  the  rows  of  Uo  as  in 
the  existing  LAOS,  in  order  to  ensure  that  Lo  in  (3.14)  is  well  conditioned.  (This  is  necessary  to 
achieve  the  following  desirable  property:  the  factorisation  of  Bk  is  likely  to  be  well  concUtioned 
if  Bk  is  well  conditioned,  even  if  Bo  is  not.)  For  efficiency  the  data  structure  for  computing  Uo 
then  needs  to  be  transposed.  This  and  other  improvements  will  be  incorporated  in  a  new  version 
of  LAOS  (Reid,  private  communication). 

In  the  meantime,  the  sparsity  properties  of  LAOS  are  unsurpassed,  and  the  numericiJ  properties 
are  excellent  as  long  as  Bq  i>  not  extremely  ill-conditioned.  The  package  should  therefore  find 
increasingly  vridespread  application. 

3.54.  The  Forrest-TomBn  update.  The  update  of  Forrest  and  Tomlin  (1972)  was  developed  as  a 
means  of  improving  upon  the  sparsity  of  the  PF  update  while  retaining  the  ability  to  use  external 
storage  where  necessary.  In  fact  the  FT  update  is  a  restricted  form  of  the  BG  update,  in  which  no 
row  interchanges  are  idlowed  when  eliminating  the  bottom  row  of  P^P.  This  single  difference 
removes  the  fill-in  difficulty  (but  at  the  expense  of  losing  guaranteed  numerical  stability). 

Algebraically,  a  new  column  «k  is  added  to  Uk—i,  the  Is-th  column  and  row  are  deleted,  and 
the  transformations  M  are  combined  into  a  single  ‘row*  transformation  Hk  =  f + ~  *<»)’'• 
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R  can  be  shown  that  the  required  Tectori  satisfy 

^k—l^h  —  1***  =  *i*»  (3.16) 

and  the  new  diagonal  of  Uk  is  rj***  importantly,  the  multipliers  /i  are  closely  related  to  the 
elements  of  tk,  and  these  can  be  tested  a  posteriori  to  determine  whether  the  update  is  acceptable 
(see  also  Tomlin,  1975).  In  practice  a  rather  undemanding  test  such  as  |p|  <  10*  must  be  used  to 
avoid  rejecting  the  update  too  frequently.  The  FT  update  is  now  used  within  several  commercial 
mathematical  programming  systems. 

3.5.4.  Use  irfthe  Schur  complement.  The  work  of  Bisschop  and  Meeraus  (1977, 1980)  has  recently 
provided  a  new  perspective  on  the  problem  of  updating  within  active-set  methods.  Suppose  that 
for  each  update  a  vector  vj  replaces  the  I^-tb  column  of  Bo.  A  key  observation  is  that  the  system 
Bits  =  6  is  equivalent  to  the  system 

fBo 

\Ik 

where 

Vjt  =  ( »i  »2  •  •  •  Ofc  ),  Ik  =  (  «»i  eit  •  •  •  )^- 

Note  that  the  rectangular  matrix  ijt  is  composed  of  k  rows  of  the  identity  matrix  corresponding 
to  indices  of  columns  that  have  been  replaced.  Since  the  equations  Jitp  =  0  set  k  elements  of  p 
to  sero,  the  remaining  elements  of  p  and  x  together  give  the  required  solution  z.  Similarly,  the 
system  B^p  =  d  is  equivalent  to 

(3"X:)-(:) 


(3.17) 


if  di  and  ds  constructed  from  d  appropriately  (with  the  aid  of  k  arbitrary  elements,  such  as 
lero). 

The  matrix  in  (3.17)  may  be  factorised  in  several  different  ways.  In  the  next  two  sections  we 
consider  the  simplest  factorisation 

(j  '■■)-(?  cX'  ■;■) 

where 

Bon  =  Vlk, 

Cm  =  -An. 

The  k  X  k  matrix  Cm  is  the  Scbur  complement  for  the  partitioned  matrix  on  the  left-hand  side 
of  (3.19).  It  corresponds  to  a  matrix  of  the  ubiquitous  form  D  —  WB~^V  (e.g.,  see  Cottle,  1974). 

M.S.  A  stabOlied  prod«ct>fwm  update.  From  (3.17)  and  (3.19)  we  see  that  the  vectors  p  and  x 
needed  to  construct  the  solution  of  B*z  =  h  may  be  obtained  from  the  equations 

Bow  =  b, 

CkX  =  — 

P  =  w  —  YkX. 


(3.19) 

(3.20) 


(3.21a) 

(3.216) 

(3.21e) 
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Similar^,  the  lolntion  of  BjV  =  (f  is  obtained  fTom  the  two  linear  qretemi 

Clt  =  d^-Yld^,  (3.22o) 

Bjy  =  di  -  llz.  (3.22J) 

Assuming  that  Yk  is  available,  the  essential  operations  in  (3.21)  and  (3.22)  are  a  solve  with  Bb 
and  a  solve  with  Ck.  If  ife  is  small  enough  (saj,  k  <  100),  Ck  maj  be  treated  as  a  dense  matrix. 
It  is  then  strughtforward  to  use  an  orthogonal  factorisrtion  QkCu  =  Rk  =  I,  Rk  upper 

triangular)  or  an  analogous  factorisation  LkCn  =  Uk  based  on  Gaussian  elimination  (L^  square, 
Uk  upper  triangular).  These  factorisations  can  be  maintained  in  a  stable  manner  as  Ck  is  updated 
to  reflect  changes  to  B*.  (The  updates  involve  adding  and  deleting  rows  and  columns  of  Ck",  see 
Gill  et  a/.,  1974.)  The  stability  of  the  procedures  (3.21)  and  (3.22)  then  depends  essentially  on 
the  condition  of  Bq.  In  other  words,  if  ^  is  well  conditioned,  we  have  a  stable  method  for  solving 
Bjtz  =  h  for  many  subsequent  k. 

This  method  retuns  several  advantages  of  the  PF  update.  The  vectors  to  be  stored  (columns 
of  Ifc)  satisfy  Boy*  =  Vk,  which  is  analogous  to  (3.11).  These  vectors  should  have  spvsity 
similar  to  those  in  the  PF  update,  and  they  can  be  stored  sequentially  (in  compact  form  on  an 
external  flle,  if  necessary).  A  further  advantage  is  that  whenever  a  column  of  Ck  is  deleted,  the 
corresponding  vector  yk  may  be  skipped  in  subsequent  uses  of  (3.21c).  This  gain  would  tend  to 
offset  the  work  involved  in  maintaining  the  factors  of  Ck.  Because  of  the  pvallels,  the  method 
described  here  amounts  to  a  practical  mechanism  for  stabilising  an  implementation  based  on  the 
PF  update. 

3.5.6.  The  Sehoi^eomplement  update.  One  of  the  aims  of  Blsschop  and  Meeraus  (1977, 1980)  was 
to  give  an  update  procedure  whose  storage  requirements  were  independent  of  the  dimension  of 
Bo.  This  is  achievable  because  the  matrix  Yk  is  not  essential  for  solving  (3.17)  and  (3.18),  given 
Vk  and  a  "black  box"  for  Bq.  For  example,  (3.21e)  may  be  replaced  by 

Boy  =  k-VkZ,  (3.23), 

and  hence  storage  for  Yk  can  be  saved  at  the  expense  of  an  additional  solve  with  Bb-  Similuly, 
(3.22a)  is  equivalent  to 


BJw  =  di, 

Clz  =  <h-Vlw, 


again  involving  a  second  solve  with  Bb.  Note  that  the  orig^  data  Vs  vrill  usually  be  more  sparse 
than  Yk,  so  that  the  additional  expense  may  not  be  substantial. 

The  storage  required  for  a  dense  orthogonal  factorisation  of  Ck  (|ft^)  it  small  for  moderate 
values  of  k.  As  with  the  PF  update,  any  advance  in  solving  linear  eq^ions  is  immediatety 
applicable  to  the  equations  involving 

The  method  is  particularly  attractive  when  Bb  has  special  structure.  For  example,  certain 
linear  programs  have  the  following  form: 

minimise  e^t 

subject  to  (Bb  N)z  =  h 


It 
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where  i*  •  iqatre  bloek-diagonel  matrix: 


Bq  =  block-ditc(  )■ 


Airaming  that  the  iquare  matricei  Dj  are  well  conditioned,  Bo  proridet  a  natural  starting  basis 
for  the  umplex  method. 

HiTith  the  Sehur-complement  (SC)  update,  an  iteration  of  the  simplex  method  on  such  a 
problem  requires  four  solves  inth  Bo,  and  hence  four  solves  with  each  matrix  Dj.  In  certain 
applications,  the  matrices  Dj  are  closely  related  to  Do  (e.g.,  in  time-dependent  problems),  in 
which  case  a  farther  application  of  the  Schur-complement  technique  would  be  appropriate.  A 
simplex  iteration  then  involves  only  solves  with  Do- 

This  is  a  situation  in  which  one  factorisation  is  followed  by  hundreds  or  even  thousands  of 
solves  (involving  both  Do  and  D^).  Thus,  it  is  useful  for  black-box  solvers  to  be  tuned  to  the 
case  of  multiple  right-hand  sides. 


S.S.7.  The  partitioned  LU  update.  Recall  that  the  PF  approach  accumulates  updates  in  a 
single  file,  while  the  BG  and  FT  methods  seek  to  reduce  the  storage  required  for  the  updates  by 
updating  two  separate  factors  (one  implicitly  through  a  file  of  updates,  the  other  explicitly).  Here 
we  suggest  leaving  Lo  uid  Uo  unaltered  (in  effect,  treating  them  as  two  ‘black  boxes”  for  solving 
linear  ^sterns),  and  accumulating  two  flies  of  updates.  In  place  of  the  block  factorisation  (3.19) 
we  can  write 

/  V.  \  /  \/  TK  w.  \ 

(3.24) 


(t  -MS  ..X'‘  ?) 


with  the  same  deflnition  (3.20)  of  Cu.  After  the  k-th  update,  the  new  column  of  Wk  and  row  of 

Ra  satisfy 

LoWjt  =  Vk  and  =s  (3.25) 

The  similarity  of  (3.25)  with  the  equations  (3.16)  for  the  FT  update  leads  us  to  suppose  that  the 
storage  requirements  would  be  at  least  as  low  as  for  the  FT  update.  Apart  from  the  need  to  store 
and  update  C*,  all  implementation  advantages  are  retained  (in  fact  improved  upon,  since  Uo  is 
not  altered).  As  with  the  PF  and  SC  updates,  the  stability  depends  primarily  on  the  condition 
of  Bo-  We  could  therefore  regard  the  factorization  (3.24)  as  a  practical  and  stable  alternative  to 
the  FT  update. 


3.5.8.  Avdding  access  to  Bo*  In  active-set  methods,  it  is  often  necessary  to  solve  the  equations 
BkZ  =  V,  where  e  is  a  column  of  the  matrix  A.  Although  v  will  not  be  a  column  of  Bk,  it  could 
be  a  column  of  Bo.  If  Bo  were  not  stored  in  main  memory,  it  would  be  desirable  to  access  its 
columns  as  seldom  as  possible.  In  this  section  we  shall  show  that  with  the  PF  update  or  the 
Schur-complement  updates,  the  elements  of  Bo  need  not  be  accessed  once  the  initial  factorisation 
has  been  complex. 

Assume  that  v  is  the  Irih.  column  of  Bo,  so  that  v  —  Boei  by  deflnition.  For  the  PF  update 
it  follows  by  substituting  the  expression  for  v  in  (3.13)  that 


Tx-'’Tt,x  =  Si, 

which  gives  an  equation  for  x  that  does  not  involve  e  or  Bb*  With  the  Schur-complement  approach, 
(3.21a)  reduces  to  w  =  si,  while  (3.23)  can  be  rearranged  to  give  Bo{y  —  ti)  as  —Vk*.  In  either 
ease,  when  solving  for  x  we  can  avoid  not  only  an  explicit  reference  to  the  elements  of  Bb  but 
also  a  s<dva  with  Bb. 
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Similarly,  it  is  often  necessary  to  sol^e  Bftf  =  d  and  then  to  form  'jj  =  for  each  column 
Vj  that  has  been  replaced  in  Bq.  (The  quantities  7^  are  the  reduced  costs  or  reduced  gradients 
for  variables  that  have  been  removed  from  Bq.)  If  t  denotes  the  product  B^y,  then  by  definition 
of  Vj  it  follovrs  that  y'^Vj  =  t^ei^ .  >^th  both  the  PF  and  the  Schur-complement  updates,  f  is  a 
by-product  of  the  procedure  for  computing  y.  Thus,  t  and  all  relevant  values  7^  are  available  at 
no  cost. 

These  results  confirm  that  Bb  need  exist  only  in  the  form  of  a  "black  box”  for  solving  linear 
systems. 

3.6.  Other  applications  of  the  Sehm^compicmeBt  update.  Historically,  the  formulation  LCP 
has  been  used  because  it  involves  only  column  updates  to  Bk,  vrhich  have  appeared  to  be  the 
least  difficult  kind  of  update  to  implement  for  sparse  problems.  However,  the  Schur-complement 
approach  also  applies  to  more  general  sequences  of  related  square  systems.  As  vrith  column 
replacement,  the  key  idea  is  to  solve  a  partitioned  system  that  involves  the  original  matrix. 

3.6.1.  Unsymmetrie  rank*one  updates.  Consider  the  case  in  which  Bo  undergoes  a  sequence  of 
rank-one  modifications: 

Bk  =  =  ^0  +  VkSl 

The  solution  of  B^x  =  6  is  part  of  the  solution  of  the  extended  system 

(Kron,  1956;  Bisschop  and  Meeraus,  1977).  Given  factorisations  of  Bo  and  the  Schur  complement 
Ck  =  —I  —  SlB^^Vk,  the  solution  may  be  obtained  from 

CkZ  =  —Slw, 

Box  =  b  —  Vkt, 

where  Bqw  =  b.  An  alternative  that  would  require  more  storage  but  less  work  could  be  obtained 
by  using  Bo  =  LqUo  and  storing  the  vectors  defined  by  Low*  =  v*,  UoTk  =  Let  Rk  denote 
the  matrix  whose  j-th  column  is  ry,  and  similarly  for  Wit.  In  this  case,  the  solution  of  (3.26) 
would  be  obtained  from 

CkZ  =  — B^t; 

Uox  =  v  —  Wit*, 

where  Lqv  =  b.  Either  approach  is  an  alternative  to  updating  a  factorisation  of  Bk  itself  (e.g., 
Gille  and  Loute,  1981, 1982),  which  is  even  more  difficult  to  implement  than  the  BG  update. 

We  emphasise  that  column  or  row  replacements  are  best  treated  as  a  special  case,  not  as  a 
sequence  of  general  rank-one  modifications. 

3.6.2.  A  symmetric  Sehni^eomplemeiit  update.  It  vras  observed  in  Section  3.1  that  in  some 
circumstances  the  search  direction  can  be  computed  by  solving  the  linear  system  (3.2)  involving 
the  augmented  matrix 

(3.27) 

YTithin  an  active-set  method,  changes  in  the  status  of  fixed  and  free  variables  lead  to  changes  in 
B  and  A.  When  a  variable  becomes  fixed,  the  corresponding  row  and  column  of  Afk  are  deleted; 
when  a  variable  is  freed,  a  new  row  and  column  of  Afk  are  added. 


ao 
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Inftaad  of  updating  a  faetoriiation  of  Ma,  we  can  ftart  with  some  Mo  and  work  with  an 
augmented  qrstem  of  the  form 

(5  ")• 

If  a  variable  is  fixed  at  the  ft-th  change,  the  k-th  column  of  Sa  is  an  appropriate  coordinate  Tector; 
if  the  t-th  Tariable  is  freed,  the  column  is 


where  hi  is  obt^ed  from  the  ^th  column  of  the  full  Hessian,  and  ai  is  the  1-th  column  of  A.  The 
solution  of  the  augmented  system  corresponding  to  the  k-th  working  set  can  then  be  obtained 
using  a  factorisation  of  Mo  and  a  factorisation  of  the  Schur  complement  Ca  =  —SlM^^Sa- 


3.7.  Linear  and  qundratie  programming*  Two  important  special  cases  of  LCP  are  linear  and 
quadratic  programs.  Since  there  are  no  user-supplied  functions,  the  computation  in  linear  and 
quadratic  programming  methods  inyolyes  primarily  linear  algebraic  operations. 

3.7.1.  Large-scale  linear  programming.  Large-scale  linear  programs  occur  in  many  important 
applications,  such  as  economic  planning  and  resource  allocation.  Methods  and  software  for  iarge- 
scale  LP  have  thus  achieved  a  high  lerel  of  sophistication,  and  many  of  the  techniques  discussed 
in  Section  3  were  designed  originally  for  use  within  the  simplex  method. 

Much  research  has  involved  linear  programs  with  special  structure  in  the  constraint  matrix  — 
for  example,  those  arising  from  networks  or  time-dependent  systems.  It  is  impossible  to  summarise 
methods  for  specially-structured  linear  programs  in  a  survey  paper  of  this  type.  However,  to 
illustrate  the  flavor  of  the  work,  we  consider  staircase  linear  programs  (which  were  used  in  the 
examples  of  Section  3.4).  These  arise  in  modeling  time-dependent  processes;  the  recent  book 
edited  by  Dantiig,  Dempster  and  Kallio  (1981)  is  entirely  devoted  to  such  problems.  It  has  long 
been  observed  that  the  simplex  method  tends  to  be  less  efficient  on  staircase  problems  than  on 
general  LPs.  To  correct  this  deficiency,  work  has  tended  to  proceed  in  two  directions.  First,  the 
simplex  method  can  be  adapted  to  take  advantage  of  the  staircase  structure,  by  using  special 
techniques  for  factorising,  updating,  and  pricing  (Fourer,  1982).  Second,  special-purpose  methods 
can  be  designed  to  exploit  particular  features  of  the  problem.  For  staircase  problems,  several 
variations  of  the  decomposition  approach  (Dantsig  and  Wolfe,  1960)  have  been  suggested.  The 
basic  idea  is  to  solve  the  the  proUem  in  terms  of  smaller,  nearly  independent,  subproblems. 

3.7.2.  Lafge*scak  quadratic  programming.  A  general  statement  of  the  quadratic  programming 
problem  is 

minimise  c^x  ^ 
sen**  2 

subject  to  Ax  =  b 

I  <  X  <  u, 


where  If  is  a  qrmmetrie  matrix. 

An  early  approach  to  quadratic  programming  was  to  transform  the  problem  into  a  linear 
program,  which  is  then  solved  by  a  modified  LP  method  (e.g.,  Beale,  1967).  The  most  popular 
quadratic  programming  algorithms  are  now  based  on  the  active-set  approach  described  in  Section 
3.1  (for  a  comprehensive  survey  of  QP  methods,  see  Cottle  and  Djang,  1979),  and  the  search 
direction  is  defined  by  the  subproblem  (3.1).  Efficient  methods  for  sparse  quadratic  programs 
thus  inwdve  specialising  the  techniques  discussed  in  Section  3.3  for  the  special  case  when  the 
Hessian  is  constant. 
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4.  MowMii— i^y  Comtrrimd  OpUniutiM 

The  nonliaearly  eoBitraiaed  optindiation  imblem  it  utumed  to  be  of  the  foUowing  form; 


NCP 

mimmise 

F(x) 

s€n* 

subject  to 

e{x)  =  0 
t  <  X  <  u 

I 


i 

i 

I 
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where  e(z)  is  a  Tector  of  m  nonlinear  eonstrmnt  fhnetiont.  We  shall  utume  that  these  constraints 
are  “sparse”,  in  the  sente  that  the  m  X  n  Jacobian  matrix  A(x)  of  c(x)  is  sparse.  For  simplicity, 
we  shall  usually  not  distinguish  between  linear  and  nonlinear  constraints  in  e(x).  Howerer,  it  it 
usually  considered  desirable  to  treat  linear  and  nonlinear  constraints  separately. 

Problems  with  nonlinear  constr^ts  are  considerably  more  difficult  to  solve  than  those  with 
only  linear  constraints.  There  it  an  enormous  literature  concerning  methods  for  nonlinear  con¬ 
straints;  recent  orerviewt  are  given  in  Fletcher  (1981)  and  Gill,  Murray  and  Wright  (1981).  In  this 
section,  we  shall  concentrate  on  the  impact  of  sparsity  rather  than  attempt  a  thorough  discussion 
of  the  methods. 

One  aspect  of  NCP  that  it  directly  relevant  to  sparse  matrix  techniques  it  that  any  tuper- 
linearly  convergent  algorithm  must  consider  the  curvature  of  the  nonlinear  constraint  functions, 
and  thus  the  Hessian  of  interest  is  the  Hessian  of  the  Lagrungitn  function  rather  than  the  Hessian 
of  F  alone.  Let  the  Hessian  of  the  Lagrangian  function  be  denoted  by  W{x,  X)  ~  H{x)  — 
^«*^<(<)i  where  Ht  is  the  Hessian  of  c<.  At  first,  it  might  appear  unlikely  that  this  matrix 
would  be  sparse,  since  it  is  a  weighted  sum  of  the  Hessians  of  the  objective  function  and  the 
constrs^ts.  However,  sparsity  in  the  gradient  of  a  nonlinear  constraint  alwtys  implies  sparsity 
in  its  Hessian  matrix.  For  example,  if  the  gradient  of  e<(s)  contidns  five  nonsero  components, 
the  corresponding  Hessian  matrix  H<(x)  can  have  at  most  25  nonsero  elements.  Furthermore, 
there  is  often  considerable  overlap  in  the  positions  of  nonsero  elements  in  the  Hessians  of  different 
constraints.  Thus,  in  practice  the  Hessian  of  the  Lagranffan  function  it  often  very  sparse. 

The  usual  approach  to  solving  NCP  is  to  construct  a  sequence  of  unconstrained  or  linewly 
constrained  subproblems  whose  solutions  converge  to  that  of  NCP.  Early  methods  included 
unconstrained  subproblems  based  on  penalty  and  barrier  functions  (see  Fiacco  and  McCormick, 
1968).  Unfortunately,  these  methods  suffer  from  inevitable  ill-con^tioning;  thoy  have  for  the 
most  part  been  superseded  by  more  efficient  methods. 

4.1.  Augmented  LagruglaB  methods.  Augmented  Lagrangian  methods  were  motivated  in  large 
part  the  availability  of  good  methods  for  unconstrained  optimisation.  The  original  idea  was 
to  minimiift  an  approximation  to  the  Lagrangian  function  that  has  been  suitably  augmented  (by 
a  penalty  term)  so  that  the  solution  is  a  local  unconstrained  minimum  of  the  augmented  function 
(Hestenes,  1969;  Powell,  1969). 

In  particular,  an  augment  Lagran^an  method  can  be  defined  in  which  ss+i  is  taken  as 
the  solution  of  tlm  subproUem 


mteimise  Mx,Xk,Pft) 

•cn* 

subject  to  t<x^u, 
whme  the  augmented  Lagrangfan  function  La  ii  defined  ity 

M*.  K  f)  =  F(*)  —  X’«(f)  + 


(4.1) 


(4.« 


-I.. 
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The  Tector  X*  it  an  eiiimate  of  the  Lagrange  multiplier  vector,  and  ptt  it  a  tnitably  choten  non- 
negative  tealar.  Alternatively,  it  it  pottible  to  treat  any  general  linear  conttraintt  by  an  active-tet 
method  (Section  3.1),  and  to  include  only  nonlinear  conttr^tt  in  the  augmented  Lagrangian 
function.  Whatever  the  definition  of  the  tubproblem,  the  algorithm  hat  a  two-Ierel  ttructure  — 
"outer”  iterationt  (corretponding  to  diiferent  tubproblemt)  and  "inner”  iterationt  (within  each 
tubproblem). 

The  Hettian  of  interett  when  tolving  (4.1)  it  the  Hetsian  of  (4.2),  which  it  W{x,  X*)  -}- 
Pjki4(x)^A(x).  If  the  only  conttant  elementt  of  the  Jacobian  matrix  are  lero,  the  tparsity  patterns 
of  W{z,  X)  and  the  Hessian  matrix  of  La  are  generally  identical.  Hence,  techniques  designed  to 
use  an  explicit  sparse  Hettian  may  be  applied  to  (4.1). 

The  Jacobian  matrix  A(z)  need  not  be  stored  explicitly  in  order  to  solve  the  tubproblem  (4.1). 
If  a  fairly  accurate  solution  of  (4.1)  is  computed,  an  improved  Lagrange  multiplier  estimate  may  be 
obtained  without  tolving  any  linear  systems  involving  A(z).  However,  in  several  recent  augmented 
Lagrangian  methods,  (4.1)  is  solved  only  to  low  accuracy  in  order  to  avoid  expending  function 
evaluations  when  X*  it  a  poor  estimate  of  the  optimal  multipliers;  in  this  cate,  some  factorization 
of  the  matrix  A(zk+i)  is  required  to  obtain  an  improved  Lagrange  multiplier  estimate  (by  solving 
either  a  linear  system  or  a  linear  least-squares  problem).  The  relevance  of  the  storage  needed  for 
the  Jacobian  and/or  a  factorization  depends  on  the  number  of  nonlinear  constraints  and  the 
sparsity  of  the  Jacobian. 

4.3.  LIneariy  constrained  tubproblemt.  The  solution  of  NCP  is  a  minimum  of  the  Lagrangian 
function  in  the  subspace  defined  by  the  gradients  of  the  active  constraints.  This  property  leads  to 
a  class  of  methods  in  which  linearizations  of  the  nonlinear  constraints  are  used  to  define  a  linearly 
constrained  subproblem,  of  the  form 

minimise  F(z)  -  X[  (c(z)  -  aJz) 

subject  to  A*(z  —  z*)  =  — c* 
i<  X  <  u, 

where  cu  and  As  denote  c(zs)  and  A(zs)  (Robinson,  1972;  Rosen  and  Kreuser,  1972).  W^th  this 
formulation,  the  Lagrange  multipliers  of  the  k-ih  subproblem  may  be  taken  as  the  multiplier 
estimate  \k+t  in  defining  the  next  subproblem,  and  will  converge  to  the  true  multipliers  at  the 
solution.  When  e(z)  contains  both  linear  and  nonlinear  functions,  only  the  nonlinear  functions 
need  be  included  in  the  objective  function  of  (4.3).  Under  suitable  assumption^,  the  solutions  of  the 
subproblemt  converge  quadratically  to  the  solution  of  NCP.  A  further  benefit  of  the  subproblem 
(4.3)  it  that  linear  constraints  may  be  treated  explicitly. 

One  of  the  important  conditions  for  convergence  with  the  subproblems  (4.3)  is  a  "sufficiently 
close”  starting  point;  thus,  some  procedure  must  be  used  to  prevent  divergence  from  a  poor  value 
of  zq.  Rosen  (1980)  suggested  a  two-phase  approach,  starting  with  a  penalty  function  method. 
In  the  MINOS/AUGMENTED  system  of  Murtagh  and  Saunders  (1982),  the  objective  function  of 
the  subproblem  is  defined  as  a  modified  augmented  Lagrangian  of  the  form 

l.(x,  X», «)  S  jr(»)  -  xrj»(*)  +  (*)»?*(*),  (4.4) 

where 

ik(z)  =  e(z)  —  (cfc  +  A*(z  —  z*)). 
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The  speriity  pattern  of  La  it  identical  to  that  of  lV(z,  X),  irretpectife  of  a^y  noniero  conttant 
elements  in  the  Jacobian  matrix. 

Methods  based  on  tolying  (4.3)  have  several  benefits  for  sparse  problems.  The  ability  to 
treat  linear  constraints  explicitly  it  helpfhl  for  the  many  large  problems  in  which  most  of  the 
constraints  are  linear.  As  noted  in  the  Introduction,  it  is  often  a  feature  of  sparse  problems 
that  the  cost  of  evaluating  the  problem  functions  it  dominated  by  the  sparse  matrix  operations. 
The  superiority  of  SQP  methods  (Section  4.3.2)  for  dense  problems  results  from  the  generally 
lower  number  of  function  evaluations  compared  to  methods  based  on  (4.3);  for  sparse  problems, 
however,  the  function  evaluations  required  to  solve  (4.3)  may  be  insignificant  compared  to  the 
savings  that  would  result  from  solving  fewer  subproblems,  an  active-set  method  of  the  type 
described  in  Section  3.3.1  is  applied  to  (4.3),  only  the  projected  Hessian  needs  to  be  stored  (rather 
than  the  full  Hessian).  Thus,  methods  based  on  (4.3)  will  tend  to  be  more  effective  than  augmented 
Lagrangian  methods  for  problems  in  which  the  Hessian  of  the  Lagrangian  function  is  not  sparse 
and  the  projected  Hessian  can  be  stored  as  a  dense  matrix. 

4.3.  Methods  based  on  linear  and  qnadraUe  programming.  We  now  consider  two  classes  of 
methods  in  which  the  subproblems  are  solved  without  evaluation  of  the  problem  functions  (in 
contrast  to  the  methods  of  Sections  4.1  and  4.2). 

4.3.1.  Sequential  linear  programming  mrthods.  Because  of  the  availability  and  high  quality  of 
software  for  sparse  linear  programs,  a  populm  technique  for  solving  large-scale  problems  has  been 
to  choose  each  iterate  as  the  solution  of  an  LP  subproblem;  we  shall  call  these  sequential  linear 
programming  (SLP)  methods.  They  were  first  proposed  by  Griffith  and  Stewart  (1961);  for  a 
recent  survey,  see  Palacios-Gomei,  Lasdon  and  Engquist  (1982). 

One  crucial  issue  in  an  SLP  method  is  the  definition  of  the  linear  functions  in  the  subproblem. 
A  typical  formulation  is 


minimise  gl  {z  —  Zk) 

subject  to  A*(*  —  Zk)  =  — c* 
t<z<u. 

W^th  some  formulations,  the  LP  may  not  be  well  posed  —  for  example,  there  may  be  fewer 
constraints  than  variables.  The  usual  vraj  of  ensuring  a  correctly  posed  subproblem  is  to  include 
additional  constraints  on  the  variables,  such  as  bounds  on  the  change  in  each  variable.  In  general, 
the  latter  are  also  needed  to  ensure  convergence. 

SLP  methods  have  the  advantage  that  the  subproblems  can  be  solved  using  all  the  technology 
of  sparse  LP  codes.  They  tend  to  be  efficient  on  two  types  of  problems:  those  with  nearly  linear 
functions,  particularly  slightly  perturbed  linear  programs;  and  those  in  which  the  functions  can 
be  closely  approximated  by  piecewise  linear  functions  (e.g.,  the  objective  function  is  separable  and 
convex).  Unfortunately,  on  general  problems  SLP  method  are  at  best  linearly  convergent  unless 
the  number  of  active  constraints  at  the  solution  is  equal  to  the  number  of  variables.  Furthermore, 
the  speed  of  convergence  critically  depends  on  the  technique  that  defines  each  subproblem. 

Recently,  some  of  the  techniques  used  in  SQP  methods  (Section  4.3.2)  have  been  applied  to 
the  SLP  approach  —  such  as  the  use  of  a  merit  function  to  ensure  progress  after  each  outer 
it«ation.  Such  techniques  cannot  be  expected  to  improve  the  atymptotic  rate  of  convergence  of 
SLP  methods,  but  thty  should  improve  robustness  and  overall  efflsctiveness. 
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BmIs  (1978)  has  gi-reii  a  method  that  is  designed  to  make  extensile  use  of  an  existing  LP 
Ijstam.  The  nonlinearlj  constrained  problem  is  assumed  to  be  of  the  form 

minimise  e(z)^p 

s.S 

subject  to  =  K^)  (4  5) 

t<  X  <u 

V  <  y  <  w. 

A  special  nonlinear  algorithm  is  then  used  to  adjust  z;  for  each  value  of  z,  a  new  estimate  y  is 
determined  bj  solving  an  LP. 

4^3.  SequenUal  qoadratie  programming  methods.  The  most  popular  methods  in  recent  years  for 
dense  nonlinearly  constrained  problems  are  based  on  solving  a  sequence  of  quadratic  programming 
subproblems  (see  Powell,  1982,  for  a  survey).  At  iteration  k,  a  typical  QP  subproblem  has  the 
form 


minimise 

*6«* 

subject  to 


+  gl? 

Akp  =  — c* 


where  Hk  is  an  approximation  to  the  Hessian  of  the  Lagrangian  function.  The  solution  of  the 
QP  subproblem  is  then  used  as  the  search  direction  pk  in  (1>1).  The  step  a*  is  chosen  to  achieve 
a  suitable  reduction  in  some  merit  function  that  measures  progress  toward  the  solution.  In  the 
dense  case,  the  most  popular  method  is  based  on  taking  Hjt  as  a  positive-definite  quasi-Newton 
approximation  to  the  Hessian  (Powell,  1977).  However,  the  many  options  in  defining  the  QP 
subproblem  have  yet  to  be  fully  understood  and  resolved  (see  Murray  and  Wright,  1982,  for  a 
discussion  of  some  of  the  critical  issues). 

Further  complex  issues  are  raised  when  applying  an  SQP  method  to  sparse  problems  (see,  e.g., 
Gill  et  a/.,  1981).  The  general  development  of  methods  has  been  hampered  because  methods  for 
sparse  quadratic  programming  are  only  just  being  developed,  and  are  not  yet  generally  available 
for  use  within  a  general  nonlinear  algorithm.  However,  Escudero  (1980)  has  reported  some  success 
with  an  SQP  implementation  in  which  a  sparse  quasi-Newton  approximation  is  used  for  H/t  (see 
also  Section  3.7.2). 
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Sparse  Matrix  Mathods  In  Optlaliatlon 

by  P.K.  6111,  W.  Murray,  M.A<  Saunders  and  M.H*  Wright 


Optlalsatlon  algorlthns  typically  require  the  solution  of  nany  systaaa 
of  linear  equations  Bur^  "  h^*  When  large  numbers  of  variables  or 
constraints  are  present,  these  linear  systems  could  account  for  much  of  the 
total  computation  time. 

Both  direct  and  Iterative  equation  solvers  are  needed  In  practice. 
Unfortunately,  most  off-the-shelf  solvers  are  designed  for  single  systems, 
whereas  optl^satlon  problems  give  rise  to  hundreds  or  thousands  of 
systems.  To  avoid  refactorlzatlon,  or  to  speed  the  convergence  of  an 
} Iterative  method.  It  Is  essential  to  note  that  B|^  la  related  to 
*k-l* 

We  review  various  sparse  matrices  that  arise  In  optimisation,  and 
discuss  compromises  that  are  currently  being  made  In  dealing  with  then. 
Since  significant  advances  continue  to  be  made  with  single-system  solvers, 
we  give  special  attention  to  methods  that  allow  such  solvers  to  be  used 
repeatedly  on  a  sequence  of  modified  systems  (e.g.,  the  product-form 
update;  use  of  the  Schur  complement).  The  speed  of  factorising  a  matrix 
then  becomes  relatively  less  important  than  tha  efficiency  of  subsequent 
solves  with  very  auny  right-hand  sides. 

At  the  sane  time,  we  hope  that  future  iaq^rovements  to  linear-equation 
software  will  be  oriented  more  specifically  to  the  case  of  related  matrices 
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